Morpho-Syntactic Study of Errors from Speech Recognition System
نویسندگان
چکیده
The study provides an original standpoint of the speech transcription errors by focusing on the morpho-syntactic features of the erroneous chunks and of the surrounding left and right context. The typology concerns the forms, the lemmas and the POS involved in erroneous chunks, and in the surrounding contexts. Comparison with error free contexts are also provided. The study is conducted on French. Morpho-syntactic analysis underlines that three main classes are particularly represented in the erroneous chunks: (i) grammatical words (to, of, the), (ii) auxiliary verbs (has, is), and (iii) modal verbs (should, must). Such items are widely encountered in the ASR outputs as frequent candidates to transcription errors. The analysis of the context points out that some left 3-grams contexts (e.g., repetitions, that is disfluencies, bracketing formulas such as “c’est”, etc.) may be better predictors than others. Finally, the surface analysis conducted through a Levensthein distance analysis, highlighted that the most common distance is of 2 characters and mainly involves differences between inflected forms of a unique item.
منابع مشابه
Comparison of the high-frequency morpho-syntactic structures of cochlear implant children and children with normal hearing aged 4-6 years
Introduction: Children with cochlear implants experience problems at all language domains, and have more problems in morpho-syntactic skills than others domains. Considering the importance of morphology and syntax in developing of communication skills of children, this study compared the use of high-frequency morpho-syntactic structures among 4-6 years old children with cochlear implants and ty...
متن کاملRecognition Assistance - Treating Errors in Texts Acquired from Various Recognition Processes
Texts acquired from recognition sources—continuous speech/handwriting recognition and OCR—generally have three types of errors regardless of the characteristics of the source in particular. The output of the recognition process may be (1) poorly segmented or not segmented at all; (2) containing underspecified symbols (where the recognition process can only indicate that the symbol belongs to a ...
متن کاملA Study on Morpho-Syntactic Patterns: A Cohesive Device in Some Persian Live Sport Radio and TV Talks
Morpho-syntactic patterns device encompasses a subcategory of the cohesive devices that assists hearers to have an adequate mental representation for understanding speech. This article investigates the morpho-syntactic patterns employed in some Persian live sport radio and TV programs adapting Dooley and Levinsohn’s theoretical and analytical framework. The research data includes around 30,000 ...
متن کاملEver decreasing circles: Speech production in semantic dementia.
We explored the impact of a degraded semantic system on lexical, morphological and syntactic complexity in language production. We analysed transcripts from connected speech samples from eight patients with semantic dementia (SD) and eight age-matched healthy speakers. The frequency distributions of nouns and verbs were compared for hand-scored data and data extracted using text-analysis softwa...
متن کاملInformations morpho-syntaxiques et adaptation thématique pour améliorer la reconnaissance de la parole
A way to improve outputs produced by automatic speech recognition (ASR) systems isto integrate additional linguistic knowledge. Our research in this eld focuses on two aspects:morpho-syntactic information and thematic adaptation.In the rst part, we propose a new mode of integration of parts of speech in a post-processingstage of speech decoding. To do this, we tag N-best sentenc...
متن کامل